Agglomerative Clustering of Bagged Data Using Joint Distributions
نویسندگان
چکیده
Current methods for hierarchical clustering of data either operate on features of the data or make limiting model assumptions. We present the hierarchy discovery algorithm (HDA), a model-based hierarchical clustering method based on explicit comparison of joint distributions via Bayesian network learning for predefined groups of data. HDA works on both continuous and discrete data and offers a model-based approach to agglomerative clustering that does not require prespecification of the model dependency structure.
منابع مشابه
Clustering Large Data Sets Described With Discrete Distributions and An Application on TIMSS Data Set
Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the usual representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We pres...
متن کاملClustering large data sets described with discrete distributions and its application on TIMSS data set
Symbolic Data Analysis is based on a special descriptions of data – symbolic objects. Such descriptions preserve more detailed information about data than the standard representations with mean values. A special kind of symbolic object is also representation with distributions. In the clustering process this representation enables us to consider the variables of all types at the same time. We p...
متن کاملHierarchical clustering of word class distributions
We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...
متن کاملHierarchical clustering of word class distributions
We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions...
متن کاملClassifying qualitative information using centroid based methods
In this paper we consider the problem of classification of qualitative data. We present an approach to overcome several difficulties which show SAHN (sequential, agglomerative, hierarchic, nonoverlapping) clustering methods to classify qualitative data when using centroids to compute similarities between pairs of classes. The approach is based on a recently proposed class of qualitative weighte...
متن کامل